Task-aware deep bottleneck features for spoken language identification

نویسندگان

Bing Jiang

Yan Song

Si Wei

Ian Vince McLoughlin

Li-Rong Dai

چکیده

Recently, deep bottleneck features (DBF) extracted from a deep neural network (DNN) containing a narrow bottleneck layer, have been applied for language identification (LID), and yield significant performance improvement over state-of-the-art methods on NIST LRE 2009. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. More recently, lattice based discriminative training methods for extracting more targeted DBF were proposed for ASR. Inspired by this, this paper proposes to tune the post-trained DNN parameters using an LID-specific training corpus, which may make the resulting DBF, termed a Discriminative DBF (D2BF), more discriminative and task-aware. Specifically, the maximum mutual information (MMI) criterion, with gradient descent, is applied to update the DNN parameters of the bottleneck layer in an iterative fashion. We evaluate the performance of the proposed D2BF using different back-end models, including GMM-MMI and ivector, over the most confused 6-languages selected from NIST LRE 2009. The results show that the proposed D2BF is more appropriate and effective than the original DBF.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Bottleneck Features for Spoken Language Identification

A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for ...

متن کامل

An Investigation of Spoken Output and Intervention Types among Iranian EFL Learners

This study was inspired by VanPatten and Uludag’s (2011) study on the transferability of training via processing instruction to output tasks and Mori’s (2002) work on the development of talk-in-interaction during a group task. An interview was devised as the pretest, posttest, and delayed posttest to compare four intervention types for teaching the simple past passive: traditional intervention ...

متن کامل

End-to-end DNN-CNN Classification for Language Identification

A defining problem in spoken language identification (LID) is how to design effective representations which allow features to be extracted that are specific to language information. Recent advances in deep neural networks for feature extraction have led to significant improvements in results, with deep end-to-end methods proving effective. In this paper, a novel network is proposed and explored...

متن کامل

Deep learning for spoken language identification

Empirical results have shown that many spoken language identification systems based on hand-coded features perform poorly on small speech samples where a human would be successful. A hypothesis for this low performance is that the set of extracted features is insufficient. A deep architecture that learns features automatically is implemented and evaluated on several datasets.

متن کامل

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification

A key problem in spoken language identification (LID) is how to effectively model features from a given speech utterance. Recent techniques such as end-to-end schemes and deep neural networks (DNNs) utilising transfer learning such as bottleneck (BN) features, have demonstrated good overall performance, but have not addressed the extraction of LID-specific features. We thus propose a novel end-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Task-aware deep bottleneck features for spoken language identification

نویسندگان

چکیده

منابع مشابه

Deep Bottleneck Features for Spoken Language Identification

An Investigation of Spoken Output and Intervention Types among Iranian EFL Learners

End-to-end DNN-CNN Classification for Language Identification

Deep learning for spoken language identification

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification

عنوان ژورنال:

اشتراک گذاری